Bash script: find and replace uppercase character on a string - string

Saying we have a string like:
doSomething()
and we want to obtain:
do_something()
What is the best way to do this?
I've read documentation about strings manipulation but I can't find the right command combination..
update
After #anubhava discussion, I find solution installing gnu-sed:
brew install gnu-sed
And then I can run script in this way:
s="doSomethingElse()"; gsed 's/[[:upper:]]/_\L&/g' <<< "$s"
output: do_something_else()

Using gnu-sed you can do:
s='doSomethingElse()'
sed 's/[[:upper:]]/_\L&/g' <<< "$s"
do_something_else()
Or else with non-gnu-sed (BSD) pipe with tr:
sed 's/[[:upper:]]/_&/g' <<< "$s" | tr [[:upper:]] [[:lower:]]
do_something_else()
Or using perl:
perl -pe 's/[[:upper:]]/_\L$&/g' <<< "$s"
do_something_else()
Or using gnu-awk:
awk -v RS=[[:upper:]] -v ORS= '1; RT{printf "_%s", tolower(RT)}' <<< "$s"
do_something_else()

In bash 4:
$ s="doSomethingElse()"
$ while [[ $s =~ [[:lower:]]([[:upper:]]) ]]; do
> s=${s/[[:upper:]]/_${BASH_REMATCH[1],,}}
> done
$ echo "$s"
do_something_else()
First, the while loop tries to match a lowercase character immediately followed by an uppercase character, and capturing the matched uppercase character. The parameter expansion replaces the first uppercase character in the string with an underscore and the captured uppercase character (converted to lowercase by the ,, operator). The processes repeats until no more lower/upper pairs are found.
If bash allowed capture groups in patterns, something hypothetical like
s=${s//([[:lower:]])([[:upper:]])/${BASH_PATMATCH[1]}_${BASH_PATMATCH[2],,}}
could work without a loop. As is, we need the extra step of using regular expression matches one match at a time to capture the letter to be lowercased.

Related

Is it possible to retrieve one string between 2 special characters from text file using bash?

Let's say I have the following text file
test.txt
ABC_01:Testing-ABCDEFG
If I want to retrieve the string after colon, I will be using
awk -F ":" '/ABC_01/{print $NF}' test.txt
which will return Testing-ABCDEFG
But what should I do if I only want to retrieve the string after the colon and before the hyphen?
You are so close. That is where split() comes in, e.g.
awk -F: '/ABC_01/{ split($NF,arr,"-"); print arr[1] }'
Which will output
Testing
The GNU Awk User's Guide - String Manipulation Functions provides the details on split(). Give it a try and let me know if you have any further questions.
Using Bash's built'in Extended Regex Engine
#!/usr/bin/env bash
while read -r; do
[[ $REPLY =~ :(.*)- ]] || :
echo "${BASH_REMATCH[1]}"
done
Using standard POSIX shell IFS field separators:
#!/usr/bin/env sh
while IFS=':-' read -r _ m _; do
echo "$m"
done
Using (GNU) grep and look-around:
$ grep -oP '(?<=:)[^-]*(?=-)' file
Testing
Explained:
grep GNU grep supports PCRE and look-around
`-o Print only the matched (non-empty) parts of a matching line
-P Interpret PATTERNS as Perl-compatible regular expressions
(?<=:) positive look-behind, ie. preceeded by a colon
[^-]* anything but a hyphen
(?=-) positive look-ahead, ie. followed by a hyphen

bash extract version string & convert to version dot

I want to extract version string (1_4_5) from my-app-1_4_5.img and then convert into dot version (1.4.5) without filename. Version string will have three (1_4_5) or four (1_4_5_7) segments.
Have this one liner working ls my-app-1_4_5.img | cut -d'-' -f 3 | cut -d'.' -f 1 | tr _ .
Would like to know if there is any better way rather than piping output from cut.
Here's an attempt with parameter expansion. I'm assuming you have a wildcard pattern you want to loop over.
for file in *-*.img; do
base=${file%.img}
ver=${base##*-}
echo "${ver//_/.}"
done
The construct ${var%pattern} returns the variable var with any suffix matching pattern trimmed off. Similarly, ${var#pattern} trims any prefix which matches pattern. In both cases, doubling the operator switches to trimming the longest possible match instead of the shortest. (These are POSIX-compatible pattenr expansion, i.e. not strictly Bash only.) The construct ${var/pattern/replacement} replaces the first match in var on pattern with replacement; doubling the first slash causes every match to be replaced. (This is Bash only.)
You can do it with sed:
sed -E "s/.*([0-9]+)_([0-9]+)_([0-9]+).*/\1.\2.\3/" <<< my-app-1_4_5.img
Assuming the version number will always be between the last dash and the file extension, you can use something like this in pure Bash:
name="file-name-x-1_2_3_4_5.ext"
version=${name##*-}
version=${version%%.ext}
version=${version//_/.}
echo $version
The code above will result in:
1.2.3.4.5
For a complete explanation about the brace expansions used above, please take a look at Bash Reference Manual: 3.5.1 Brace Expansion.
Remove everything but 0 to 9, _ and newline and then replace all _ with .:
echo "my-app-1_4_5.img" | tr -cd '0-9_\n' | tr '_' '.'
Output:
1.4.5
With bash and a regex:
echo "my-app-1_4_5.img" | while IFS= read -r line; do [[ "$line" =~ [^0-9]([0-9_]+)[^0-9] ]] && echo "${BASH_REMATCH[1]//_/.}"; done
Output:
1.4.5
A slightly shorter variant
name=my-app-1_4_5.img
vers=${name//[!0-9_]}
$ echo ${vers//_/.}
1.4.5

Linux Bash. Delete line if field exactly matches

I have something like this in a file named file.txt
AA.201610.pancake.Paul
AA.201610.hello.Robert
A.201610.hello.Mark
Now, i ONLY get the first three fields in 3 variables like:
field1="A"
field2="201610"
field3='hello'.
I'd like to remove a line, if it contains exactly the first 3 fields, like , in the case described above, i want only the third line to be removed from the file.txt . Is there a way to do that? And is there a way to do that in the same file?
I tried with:
sed -i /$field1"."$field2"."$field3"."/Id file.txt
but of course this removes both the second and the third line
I suggest using awk for this as sed can only do regex search and that requires escaping all special meta-chars and anchors, word boundaries etc to avoid false matches.
Suggested awk with non-regex matching:
awk -F '[.]' -v f1="$field1" -v f2="$field2" -v f3="$field3" '
!($1==f1 && $2==f2 && $3==f3)' file
AA.201610.pancake.Paul
AA.201610.hello.Robert
Use ^ to anchor the pattern at the beginning of the line. Also note that . in a regex means "any character" and not a literal peridio. You have to escape it: either \. (be careful with shell escaping and the difference between single and double quotes) or [.]
Sed cannot do string matches, only regexp matches which becomes horrendously complicated to work around when you simply want to match a literal string (see Is it possible to escape regex metacharacters reliably with sed). Just use awk:
$ awk -v str="${field1}.${field2}.${field3}." 'index($0,str)!=1' file
AA.201610.pancake.Paul
AA.201610.hello.Robert
The question was about bash so in bash:
#!/usr/bin/env bash
field1="A"
field2="201610"
field3='hello'
IFS=
while read -r i
do
case "$i" in
"${field1}.${field2}.${field3}."*) ;;
*) echo -E "$i"
esac
done < file.txt

Split a string and pick the uppercase substring

Consider the following example variables in bash:
PET="cat/DOG/hamster"
FOOD="soup/soup/PIZZA"
SUBJECT="MATH/physics/biology"
How can I split any of those strings by a slash, extract the part that's all uppercase and store it in a variable? For example, how would I take DOG out of the $PET variable and store it in an $OPTION variable?
I need a portable solution that works under bash and zsh specifically.
You could use tr to remove all characters that are not uppercase:
OPTION=$(tr -dc '[:upper:]' <<< $PET)
Note that here-strings (<<< $VARIABLE) are a bash-ism. In other shells you'll have to echo the variable into tr:
OPTION=$(echo "$PET" | tr -dc '[:upper:]')
It sounds like there is only one portion of the string is in uppercase, so you can ignore the splitting portion of the question. This should work in both zsh and bash (although it is not portable in the sense of POSIX compatibility):
$ echo "${PET//[^A-Z]}"
DOG
You can try something like this -
OPTION=$(gawk -F'/' '{for (i=1;i<=NF;i++) if ($i ~ /\<[A-Z]+\>/) print $i}' <<< $PET)
If you like a pure bash solution then you can add following piece of code
#!/bin/bash
PET="cat/DOG/hamster"
IFSBK=$IFS
IFS='/'
for word in $PET; do
if [[ $word =~ [A-Z]+ ]]; then
OPTION="$word"
fi
done
IFS=$IFSBK

How should I do if the pattern in awk cmd is a bash variable and contains special character?

Description: The 1-line awk cmd is used to print all lines after the matched line in my shell script as below.
#!/bin/bash
...
awk "f;/${PATTERN}/{f=1}" ${FILE}
Since the ${PATTERN} may contains special character, the cmd will fail in this case.
Q1. How should I handle such kind of situation if regex is used in awk?
Q2. Is it possible to just use the raw string in this cmd instead of regex e.g. /$PATTERN/ to avoid the special character problem?
Close. It's better to pass shell variables in to awk with -v than to place them in the awk script directly.
awk -v pat="${PATTERN}" 'f; $0 ~ pat {f=1}' "${FILE}"
If ${PATTERN} is not a regex, you can use a different operator:
awk -v pat="${PATTERN}" 'f; $0 == pat {f=1}' "${FILE}"
or you can even handle non-regex substrings:
awk -v pat="${PATTERN}" 'f; index($0, pat) {f=1}' "${FILE}"

Resources